MUD: Mapping-based query processing for high-dimensional uncertain data
نویسندگان
چکیده
Many real-world applications require management of uncertain data that are modeled as objects in high-dimensional space with imprecise values. In such applications, data objects are typically associated with probability density functions. A fundamental operation on such uncertain data is the probabilistic-threshold range query (PTRQ), which retrieves the objects appearing in the query region with probabilities no less than a specified value. In this paper, we propose a novel framework called MUD for efficient processing of PTRQs on high-dimensional uncertain data. We first propose a cost-effective pruning technique based on a very simple form of probabilistic pruning information (PPI), namely the probabilistic quantiles. Then we map high-dimensional uncertain objects to a single-dimensional space, where the quantiles of uncertain objects can be indexed using the existing single-dimensional indices such as the B+tree. Each PTRQ in the high-dimensional space is transformed into multiple range queries on the single-dimensional space and evaluated there. We also discuss a method to optimize the indexing scheme for MUD. Specifically, we formulate a mathematical model for measuring the “pruning power” of quantiles, and propose a dynamic programming algorithm which selects the “best” quantiles for mapping and indexing. We perform extensive experiments on both synthetic and real data sets. Our experimental results reveal that the MUD framework is both effective and efficient for processing PTRQs on high-dimensional uncertain data, and it can significantly outperform state-of-the-art schemes.
منابع مشابه
Dynamic High Dimensional Data Mapping for Efficient Similarity Query Processing
For efficient processing of similarity queries, the search space is often reduced by pruning inactive query subspaces which do not contain any query results so only those active query subspaces which may contain query results are examined. Among the active query subspaces, however, not all of them contain query results; an active query subspace that later turns out to contain no query results a...
متن کاملGeneralized Uncertain Databases: First Steps
Existing uncertain databases have difficulty managing data when exact confidence values or probabilities are not available. Confidence values may be known imprecisely or coarsely, or even be missing altogether. We propose a generalized uncertain database that can manage data with such incomplete knowledge of uncertainty. We develop a semantics for generalized uncertain databases based on Dempst...
متن کاملTowards Special-Purpose Indexes and Statistics for Uncertain Data
The Trio project at Stanford [35] for managing data, uncertainty, and lineage is developed on top of a conventional DBMS. Uncertain data with lineage is encoded in relational tables, and Trio queries are translated to SQL queries on the encoding. Such a layered approach reaps significant benefits in terms of architectural simplicity, and the ability to use an off-the-shelf query processing engi...
متن کاملProcessing Probabilistic Range Queries over Gaussian-Based Uncertain Data
Probabilistic range query is an important type of query in the area of uncertain data management. A probabilistic range query returns all the objects within a specific range from the query object with a probability no less than a given threshold. In this paper we assume that each uncertain object stored in the databases is associated with a multi-dimensional Gaussian distribution, which describ...
متن کاملانتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 198 شماره
صفحات -
تاریخ انتشار 2012